Delivering Data#

As we saw in the EDA section, creating an exploratory chart is quite easy, both with Altair and Google Sheets.

Creating a Chart With Altair vs Datawrapper#

But when it comes to creating a final visualization for public consumption, Altair is much more powerful. For example, we can use this code to add colours to our frequency chart (bar chart):

alt.Chart(permits, width=550, height=400).mark_bar().encode(
    x=alt.X("WARD:N", sort="y"),
    y=alt.Y('count()'),
    color="WARD:N")

Before creating our chart with the code above, let’s not forget to import the required libraries into this Jupyter Notebook.

import altair as alt
import pandas as pd
from myst_nb import glue

We also need to import our dataset using pd.read_csv, just like we did in our understading-data-vimo.ipynb notebook, since each notebook is independent.

permits = pd.read_csv('https://raw.githubusercontent.com/jsmarier/course-datasets/main/ottawa-building-permits-2021.csv')

Lastly, we need to allow Altair and Pandas to handle our dataset, which features over 5000 rows.

alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')

We are now ready to create our chart!

ward_bar_colours = alt.Chart(permits, width=550, height=400).mark_bar().encode(
                    x=alt.X("WARD:N", sort="y"),
                    y=alt.Y('count()'),
                    color="WARD:N")

glue("ward_bar_colours_fig", ward_bar_colours, display=False)
C:\Users\marie\miniconda3\lib\site-packages\altair\utils\core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  for col_name, dtype in df.dtypes.iteritems():

Fig. 14 I added some colours to this bar chart with the color property.#

While I personally use Google Sheets’ chart tool for my EDAs, I prefer to create my final visualizations with tools such as Datawrapper. An important drawback of Datawrapper compared to Altair is that we must first prepare the data, since Datawapper doesn’t play well with raw datasets.

It prefers aggregate data, such as what we can create with a pivot table. Let’s start with a pivot table calculating the number of projects per ward. We will put WARD under the rows, and again under the values, making sure to summarize by COUNTA, which means that Google Sheets will calculate how many times each ward appears under the ward column. This is, more or less, akin to using .groupy().

_images/2022-12-11_ass-8_pivot-table-for-visualization.png

Fig. 15 Screen capture of the pivot table used to prepare the data for Datawrapper.#

Then, I need to copy and paste the data into the first screen on Datawrapper. After that, I need to manually tell the program (under “Check & Describe”) that the second column includes numbers. By default, it parsed it as a nominal column. The third (“Visualize”) and fourth (“Publish & Embed”) steps allow us to customize and share our visualization. The animated image below offers an overview of these steps.

_images/2022-12-11_ass-8_datawrapper-create.gif

Fig. 16 Animated gif of the Datawrapper workflow.#

Here, I would argue that Altair is much faster to use than Datawrapper to create a visualization.

Annotating a Chart With Altair vs Datawrapper#

But what about customizing a visualization? Does Altair still win? Let’s find out by trying to add a title. We will use this code to generate the same chart as before, but with a title and axis labels:

alt.Chart(permits, width=550, height=400).mark_bar().encode(
    x=alt.X("WARD:N", sort="y", title="Ward"),
    y=alt.Y('count()', title="Number of Projects"),
    color="WARD:N").properties(title = "Number of Projects Per Ward"
    ).configure_title(fontSize = 20, anchor = "middle")
ward_bar_title = alt.Chart(permits, width=550, height=400).mark_bar().encode(
                x=alt.X("WARD:N", sort="y", title="Ward"),
                y=alt.Y('count()', title="Number of Projects"),
                color="WARD:N").properties(title = "Number of Projects Per Ward"
                ).configure_title(fontSize = 20, anchor = "middle")

glue("ward_bar_title_fig", ward_bar_title, display=False)

Fig. 17 I added a title to this bar chart with .configure_title.#

In comparison, Datawrapper offers an easy to use Annotation feature to add a title, a description, notes, a byline, etc. However, adding axis labels, changing the colour of individual columns, etc. requires users to navigate across multiple menu items.

_images/2022-12-11_ass-8_datawrapper_annotate.png

Fig. 18 Screen capture of Datawrapper’s annotation tool.#

Again, this is just a proof of concept. In a real data science context, the colour scheme should be informative and serve a purpose other than merely making the data visualization “colourful.” Nonetheless, we see that, once we understand its grammar, Altair can actually be much faster to use.

Other Math Equations#

The instructions require me to include two block math equations and a total of five executable code cells in this Jupyter Notebook. I will therefore discuss how to calculate the quotient and do a substraction.

Quotient#

\(\LaTeX\)

(3)#\[\frac{x_1}{x_2}\]

Python

print(x1 / x2)

Substraction#

\(\LaTeX\)

(4)#\[x_1 - x_2\]

Python

print(x1 - x2)

Let’s run the two Python functions in code cells to see the results. We can simply type the two numbers, separated by the required symbol, such as 9 / 8(3) or 9 - 8(4). Note that print will “burn” the answer into our notebook. (But the function also works without it.)

print(9 / 8)
1.125
print(9 - 8)
1